This assignment is for ETC5521 Assignment 1 by Team wallaby comprising of Helen Evangelina and Rahul Bharadwaj.

Introduction and motivation

Music, in a broad sense, is any art composed of sound, but it can express people’s thoughts and thoughts, which implies the author’s life experience, thoughts and feelings, and can bring people the enjoyment of beauty and the expression of human feelings. At the same time, music is also a form of social behavior, through which people can exchange feelings and life experiences.

In ancient times, when the court held a banquet, or some talented people visited the landscape, they would play music to boost the fun. But in modern times, because the threshold of classical music is too high, and its development has gradually reached the extreme, it has become a very small group, while pop music (the general name of popular songs, including Rock, R&B, Latin, etc) is gradually showing its own characteristics. Therefore, modern songs are quietly occupying the top position in people’s hearts because of their outstanding performance in conveying emotion and life experience. Listening to pop music has also become the most common behavior in everyone’s daily entertainment.

Nowadays, music plays an important role in people’s life. It plays an indispensable role in helping people manage and improve their quality of life. As fans of music, we not only enjoy music, but also wonder how music strikes people’s hearts with simple tones, rhythms, timbres and words. How high is the position of genre in music performance? How much influence does the genre, or the various attributes of songs, have on music? Where do we like music? Whether it makes us dance or sing unconsciously, or does it convey our emotions and implicate our thoughts? All these are the motivations that we continue to study. But now listening software has sprung up like mushrooms. After careful consideration, our group decided to select Spotify as the research object. First of all, let me introduce Spotify.

Spotify is a legitimate streaming music service platform, which has been supported by Warner Music, Sony, EMI and other major record companies around the world. Now it has more than 60 million users, and it is the world’s leading large-scale online streaming music playing platform.

Because Spotify contains a large number of users’ data, four users who are very interested in it, Charlie Thompson, Josiah parry, Donal Phipps, and Tom Wolff decided to make it easier for everyone to know their own preferences or the mainstream of most people’s listening to songs through spotify’s API, thus creating Spotifyr package. Also, it is the source of our group assignment data.

In addition to Spotify package, our data is also mixed with blog post data created by Kaylin Pavlik. Six main categories (EDM, Latin, pop, R&B, rap, rock) are used to classify 5000 songs. The combination of the two data has a great effect on the study of the popularity of pop music.

Analysis questions

By doing this exploratory data analysis, we want to know:

Main Question: What audio features are capable of making an impact on the popularity of music artworks and contribute to the emergence of Top Songs?

Sub Questions:

  1. Since 1957, what are the audio features of those top artists who make the most music artworks?

  2. Explore our favorite artist - Coldplay’s works, e.g. how about the musical positiveness conveyed by their albums?

  3. There are plenty of modern music genres nowadays, What unique style or charm can stand out and become the first choice of people?

Data description

Data Source

The data of this report is part of the tidytuesday chanllenge, which comes from Spotify via the spotifyr package.

The variables in this dataset are X, track_id, track_name, track_artist, track_popularity, track_album_id, track_album_name, track_album_release_date, playlist_name, playlist_id, playlist_genre, playlist_subgenre, danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, tempo, duration_ms, time frame of collection is from 1957-01-01 to 2020-01-29.

Data collection methods: Spotifyr package can extract track audio characteristics or other related information from Spotify’s Web API in batches. For example, if you want to search for an artist, just type in his name, and all his albums or songs will be listed in seconds. Meanwhile, Spotifyr package will record the popularity metrics of all tracks or albums, so it is easy to understand the correlation between music popularity and music characteristics. Then, Jon Harmon and Neal Grantham extracted the Spotifr package and added the content of Kaylin Pavlik’s recent blogpost to divide the genre of nearly 5000 songs, thus generating the Tidytuesdayr package we need for this assignment.

We chose music works created by artists that can be found on Spotify from January 1, 1957 to January 29, 2020.

Data structure

After reading the data on RStudio, our team used the skim() function to show the specific content and structure of the data. And here is a brief summary of the data structure:

Data summary
Name spotify_songs
Number of rows 32833
Number of columns 24
_______________________
Column type frequency:
character 10
numeric 14
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
track_id 0 1 22 22 0 28356 0
track_name 5 1 1 144 0 23449 0
track_artist 5 1 2 69 0 10692 0
track_album_id 0 1 22 22 0 22545 0
track_album_name 5 1 1 239 0 19742 0
track_album_release_date 0 1 4 10 0 4530 0
playlist_name 0 1 6 170 0 449 0
playlist_id 0 1 22 22 0 471 0
playlist_genre 0 1 3 5 0 6 0
playlist_subgenre 0 1 4 25 0 24 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
X 0 1 16417.00 9478.22 1.00 8209.00 16417.00 24625.00 32833.00 <U+2587><U+2587><U+2587><U+2587><U+2587>
track_popularity 0 1 42.48 24.98 0.00 24.00 45.00 62.00 100.00 <U+2586><U+2586><U+2587><U+2586><U+2581>
danceability 0 1 0.65 0.15 0.00 0.56 0.67 0.76 0.98 <U+2581><U+2581><U+2583><U+2587><U+2583>
energy 0 1 0.70 0.18 0.00 0.58 0.72 0.84 1.00 <U+2581><U+2581><U+2585><U+2587><U+2587>
key 0 1 5.37 3.61 0.00 2.00 6.00 9.00 11.00 <U+2587><U+2582><U+2585><U+2585><U+2586>
loudness 0 1 -6.72 2.99 -46.45 -8.17 -6.17 -4.64 1.27 <U+2581><U+2581><U+2581><U+2582><U+2587>
mode 0 1 0.57 0.50 0.00 0.00 1.00 1.00 1.00 <U+2586><U+2581><U+2581><U+2581><U+2587>
speechiness 0 1 0.11 0.10 0.00 0.04 0.06 0.13 0.92 <U+2587><U+2582><U+2581><U+2581><U+2581>
acousticness 0 1 0.18 0.22 0.00 0.02 0.08 0.26 0.99 <U+2587><U+2582><U+2581><U+2581><U+2581>
instrumentalness 0 1 0.08 0.22 0.00 0.00 0.00 0.00 0.99 <U+2587><U+2581><U+2581><U+2581><U+2581>
liveness 0 1 0.19 0.15 0.00 0.09 0.13 0.25 1.00 <U+2587><U+2583><U+2581><U+2581><U+2581>
valence 0 1 0.51 0.23 0.00 0.33 0.51 0.69 0.99 <U+2583><U+2587><U+2587><U+2587><U+2583>
tempo 0 1 120.88 26.90 0.00 99.96 121.98 133.92 239.44 <U+2581><U+2582><U+2587><U+2582><U+2581>
duration_ms 0 1 225799.81 59834.01 4000.00 187819.00 216000.00 253585.00 517810.00 <U+2581><U+2587><U+2587><U+2581><U+2581>

The spotify_song is tabular data, which contains 24 columns and 32,833 rows. The variables of the dataset include “track_id”, “track_name”, “track_artist”, “track_popularity”, “track_album_id”, “track_album_name”, “track_album_release_date”, “playlist_name”, “playlist_id”, “playlist_genre” “playlist_subgenre”, “danceability”, “energy”, “key”, “loudness”, “mode”, “speechiness”, “acousticness”, “instrumentalness”, “liveness”, “valence”, “tempo” and “duration_ms”.

Analysis and findings

Top artists

Now, we will clean the data, select the variables that are useful to our EDA, and retain six major music genres (the proportions of other genres are very low, which can be ignored). And then, we arrange the data from high to low according to track popularity.

From the following table and figure, we can see that Queen, Martin Garrix and the Chainmakers occupy one, two and three places respectively. Also, we can see that there are many famous artists on the list, such as Drake, Maroon 5 or Ed Sheeran, etc.

Similarly, this is a plot of artists with most songs showed in the bar plot. Our group decided to use two different forms to express, one is through the comparison of words(using datatable), the other is through the observation of intuitive figure. This will help to deepen our impression of the top 20 singers and have an intuitive understanding of the gap between them.

Top 20 Artists who wrote the most songs from 1941 to 2020

Top 20 Artists who wrote the most songs from 1941 to 2020

Next is a radar plot. Our group filters artists whose popularity is greater than 95, and then load the data into this type of plot. In this way, the singers who are at the top or most people like can be clear at a glance, and at the same time, music lovers can know the characteristics of these top singers’ music artworks.

First, we can see that Maroon 5, the Weekend, Roddy Rich and KAROL G are overwhelming in popularity. Because the size of each pie chart means the level of popularity. Also, it is clear that popular singers usually create many genres of songs, which are not limited to a single genre. Next, from the perspective of different artists’ music artworks style, there are filled with the great differences.

For example, from the brightness of colors, we can see that the Energy brought by Maroon 5 and Billie Eilish’s music artworks is not too high. This is not to elaborate their shortcomings, but to elaborate their style, which is lyrical and soft. If judging from the color of each fan-shaped boundary line, it can be concluded that Roddy Rich and Trevor Daniel’s works have the highest value of danceability, after the comparison of each artworks’ average tempo, rhythm stability, beat strength, and overall regularity.

Characteristics of top singers

Characteristics of top singers

Analyse our favorite artist

In this part, we want to take one artist for example to do some detailed exploratory analysis using the “spotifyr” package. Here we choose the Coldplay, our favorate artist.

First, we loaded all the albums of Coldplay available on spotify and droped the duplicate ones (some live tour albums are duplicate with the existed ones). We calculated the average valence of each album. The results are shown in the following table. According to the spotify tracks documentation, The valence variable is measured from 0.0 to 1.0, describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). The highest valence of these albums is 0.3, and the lowest valence is 0.18, which means the songs of Coldplay usually sounds more negative than positive for the audience.

The musical positiveness of Coldplay’s albums
album_name valence
Everyday Life 0.30
Viva La Vida or Death and All His Friends 0.26
Mylo Xyloto 0.25
Parachutes 0.23
A Head Full of Dreams 0.23
X&Y 0.22
Ghost Stories 0.21
Love in Tokyo 0.19
A Rush of Blood to the Head 0.18

Second, we make a density plot to show the ranges and densities of valence of each album. From the following figure, we can find that “Everyday Life” has the widest range of valence, that is to say, this album contains abundant emotions. Meanwhile, “A Rush of Blood to the Head” has a narrow range of valence, and the valence density centered at the area with lower valence values. It’s probably that the audience would feel negative emotions like sad, depressed and angry when they listening to this album. This finding surprised us because the “A Rush of Blood to the Head” is the second best album in “The Coldplay Albums Ranked”. So we decided to look more in depth next.

Lastly, we analysed the sentiment of this album to see whether the valence of an album is associated with the lyrics. The average sentiment value of this album is -0.47 by the “afinn” lexicon. And we also analysed the sentiment of lyrics using the “bing” lexicon. The following table shows the most frequent words and their sentiment in this album. In addition, the figure below shows more intuitively the frequency of words which appears more than once. We can easily find that the negative words appear more than the positive ones.

As a result, we can say for sure that, both in terms of sound and lyrics, this album conveyed negative emotions. But this doesn’t affect that people think “A Rush of Blood to the Head” is one of the best albums of the Coldplay. It can be seen that the audience’s love for a album is not entirely determined by the album’s positiveness.

The most frequent words in ‘A rush of blood to the head’
word sentiment n
love positive 7
easy positive 4
fall negative 4
grace positive 4
miss negative 4

Analyse the audio features

In this part, we analysed the audio features of all the songs in our dataset. The figure below shows how these features like in different genres. Here’s a simple explanation of these features:

  • acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic.

  • danceablity: Danceability describes how suitable a track is for dancing. A value of 0.0 is least danceable and 1.0 is most danceable.

  • duration_ms: The duration of the track in milliseconds. (And duration_s in seconds, rounded.)

  • energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.

  • instrumentalness: Predicts whether a track contains no vocals.

  • key: The key the track is in.

  • liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live.

  • loudness: The overall loudness of a track in decibels (dB).

  • mode: Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.

  • speechness: Speechiness detects the presence of spoken words in a track.

  • tempo: The overall estimated tempo of a track in beats per minute (BPM).

  • valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

The next three box plots are to find out the differences of music attributes between different Music Genres. Firstly, the relationship between color and Music Genre is established, and put into the same tibble, call “COLORS”. This method allows different Music Genre can be clearly distinguished by different colors, and then the specific characteristic of each Music Genre can be judged from those box plots.

The first plot is the relationship between Music Genre and Valence. It can be clearly seen from the plot that Latin has the highest value of Valence and EDM has the lowest value of Valence. This shows that Latin’s capacity of conveying the musical position is more powerful, while EDM sounds more negative. The other four Music Genre have no obvious trend in this respect, which are almost between 0.3 and 0.7.

Average valence by Music Genre

Average valence by Music Genre

The second plot describes the relationship between Music Genre and Energy. Energy is a measure from 0.0 to 1.0 and representatives a conceptual measure of intensity and activity. It can be clearly seen from the plot that EDM has the highest value of Energy, while R B’s value of Energy is the lowest, which also shows the style of these two Music Genres. Mostly, EDM will make people feel fast, loud, and noisy when listening. However, R&B is mainly lyrical, slow and quiet, which bring less energy for the listeners. Similarly, Rock has always been famous for its flexible and bold expression and passionate music rhythm, and its ranking is only inferior to EDM.

Average Energy by Music Genre

Average Energy by Music Genre

Finally, this plot describes the relationship between Music Genres and Speechlesness. Speechlessness detects the presence of spoken words in a track. If more words or sentences are said in a song, the closer to 1.0 the attribute value. That attribute is very interesting, which indicates whether the artists tends to express ideas by describing the lyrics in music or writing the melody of music to express their feelings.

Focus on the plot, it is no doubt that Rap is bound to occupy the first place, because the characteristic of Rap is to quickly tell a series of rhyming lyrics against the background of mechanical rhythmic sound. What is worth noting is that Rock and POP are the lowest, which shows that those two genres tend to use the melody or rhythm of music to affect the audience, rather than using the lyrics.

Average speechiness by Music Genre

Average speechiness by Music Genre

After describing the contents and internal relations of the three plots in detail, there are still many related attributes that have not been explored. The purpose of our group is to put up the most interesting parts. If someone is interested, it is easy to continue to analyse.

Music genre and their popularity - by decade of release date

After reviewing the internal relations between audio features and Music Genres, now we can discuss about the Music Genres in detail. The table below shows the distribution of each genre in this dataset. The most frequently appeared genre is “edm”, while the genre “rock” appeared least.

Genres in the dataset
playlist_genre n
edm 6043
rap 5746
pop 5507
r&b 5431
latin 5155
rock 4951

The following figure shows the average popularity of songs released in different time. To show the result clearly and for convenience of comparison, we divided the result for each genre. (1) The edm music has been popular since the 1970s, and the popularity of edm music released in the past 50 years are 40 or even less. This means the edm music is not the mainstream music type nowadays. (2) The latin and pop music have been popular since the 1960s. The 1970s was the golden time for the latin songs, while the 1960s and 1970s were the golden time for the pop music. These old songs are still popular now. (3) The r&b music went through ups and downs. The songs released from the 1980s to the 2000s are less popular than others. (4) The rap music has been popular since the 1960s, and the oldest rap music is still the most popular ones. And the songs released in the 2000s have the lowest popularity now. (5) The popularity of rock music released in different time period are quite stable. While the ones released from the 1960s to the 1990s are more popular than the others.

Correlation between popularity and audio features

Internal Relations between audio features

The correlation of song features is very helpful for us to explore the reasons for the popularity of music artworks. We can see from the correlation plot that the characteristics of each song are specific and unique, but we can summarize them with ten musical attributes. Meanwhile, there are three types of relation between different attributes: Negative correlation, positive correlation or completely irrelevant. This is very important for us to analyze the properties of music artworks in the future.

For example, if a song has a strong energy attribute, it must also have a high value of loudness, and the probability of not belonging to acoustic is also very high. If a person like songs that are more active or have higher valence, he should explore his some potential favorite songs of high danceability, high energy, and contains more vocal content. It is easy to see that the role of correlation plot is very meaningful. It can play an irreplaceable role in the analysis of songs or the selection of the favorite attributes of songs. And the rest of effects can be explored later.

Relationship between popularity and a certain audio feature

After describing the unique information about audio features, now we pay attention to exploring whether these audio features contribute to a higher popularity. First we plotted each audio feature of the songs and the popularity in the following figure to observe. It shows that liveness has a negative relationship with popularity and we also find that there’s no absolute relationship between valence and popularity. A higher valence doesn’t necessarily make a song more popular.This is consistent with our sentiment analysis.

## # A tibble: 43 x 3
## # Groups:   genre [6]
##    genre decade mean_popularity
##    <chr> <chr>            <dbl>
##  1 edm   1970              24  
##  2 edm   1980              37.4
##  3 edm   1990              39.3
##  4 edm   2000              20.7
##  5 edm   2010              35.1
##  6 edm   2020              40.3
##  7 latin 1960              26  
##  8 latin 1970              63.2
##  9 latin 1980              39.9
## 10 latin 1990              39.8
## # ... with 33 more rows

Also, We are not sure whether those above dot plots can directly reveal the relationship between these popularity and audio features. So we pay attention to exploring whether these audio features contribute to a higher popularity using a linear regression model just in case. Here we filtered the songs with a popularity greater than 0, since 0 popularity value does not make sense in this model. And the following table shows all the audio features with a p-value less than 0.05. We can draw a conclusion that danceability and valence contribute most to a higher popularity. Acousticness, key, loudness, mode and tempo also have positive relationship with popularity. While energy, instrumentalness, liveness and speechiness have negative relationship with popularity, with is similar with those dot plots conclusion.

lm(popularity~features)
term estimate std.error statistic p.value
(Intercept) 744.18 15.80 47.10 0.00
acousticness 18.36 6.85 2.68 0.01
danceability 35.67 9.91 3.60 0.00
duration_ms 0.00 0.00 -14.03 0.00
energy -253.44 11.32 -22.39 0.00
instrumentalness -109.25 6.07 -18.01 0.00
key 0.95 0.35 2.69 0.01
liveness -23.39 8.47 -2.76 0.01
loudness 14.09 0.61 23.07 0.00
mode 6.61 2.58 2.57 0.01
speechiness -43.00 12.83 -3.35 0.00
tempo 0.18 0.05 3.72 0.00
valence 37.22 6.08 6.12 0.00

Music components overtime

Explore the music characteristics over time. How is it changing? And then explore top 5 artists (according to track_popularity) in terms of the music characteristics over time – is the music characteristic for one artist changing over time?

(put here or on the introduction?) how they expand on what has been done already, why these would be interesting to pursue, and how it broadens the scope of the original analysis.

It has been previously discussed about the different music components and the correlation between each, and also track_popularity. Now, another thing that we can look at is the trend of the music components overtime. Along time, more musical instruments and more genres are being introduced, changing our music taste. Therefore, analysing the trend of music components over time is an interesting thing to look at as it would be beneficial to understand how the characteristics of music are changing. As music evolves, we want to look at how the characteristics evolve. Are the music characteristics in 1957 similar to those in 2019?

To answer the research question “How is the music characteristic over time and how is it changing?”, we plot the values of the different music components against year, and faceted according to the components. We can see Figure @ref(fig:components-trend) shows the trend of each music components over time. We can see here

The trend of music components over the years.

The trend of music components over the years.

Now, the music characteristics might change because of the emerging of new types and more modern music. But is the music characteristic also changing for an artist? it would be interesting to look at the trend to see if one specific artist also has a shift in his music characteristics. Is the same artist making the same kind of music through time?

Next, we are going to look at the top five artists and look at their music characteristics trend over time. We are not using the track_popularity to determine the top five artists here as some artists that are in the top five only have songs in a certain year, thus we are not able to compare the music characteristics over time. Therfore, we are using number of track_name instead.

what is the genre of that artist and see if it correlate

track_artist Total
Queen 111
Martin Garrix 73
David Guetta 64
Logic 62
Hardwell 61

Unlike the other four artists, Queen has a different timeline. Therefore, for the purpose of having a clear visualisation, we split Queen with the others.

i wanna look at the relo between track artist and their musical characteristics(x as artist, y as characteristics?)

can also look at relationship between year (decade) and characteristics and make scatter plot matrix coloured by artists.

a <- top5artists %>% mutate(decade = round(as.numeric(year) - 4.5, -1)) %>% pivot_longer(danceability:valence, names_to = “characteristics”, values_to = “values”) a\(year <- as.Date(a\)year, format = “%Y”)

aa <- characteristics_topartists %>% group_by(year, track_artist)

ggplot() + geom_line(data = aa, aes(x = year, y = mean, color = track_artist)) + geom_point(data = a, aes(x = year, y = values, color = track_artist)) + facet_wrap(~characteristics, scales = “free”)

the interesting one here is valence - so we look at it closer.

TO see if the characteristics are changing over time, can look at the correlation between the char and year.

lm(year~features)
term estimate std.error statistic p.value
(Intercept) 2011.04 0.08 24943.56 0
## 
## Call:
## lm(formula = year ~ acousticness, data = spotify_songs)
## 
## Coefficients:
##  (Intercept)  acousticness  
##    2011.0434        0.5351
## 
## Call:
## lm(formula = year ~ danceability, data = spotify_songs)
## 
## Coefficients:
##  (Intercept)  danceability  
##       2002.8          12.7

Conclusion

After Exploratory Data Analysis, our group got the answers to those questions. First of all, there is a positive or negative correlation between audio features and track popularity. However, as we all know, the value of a art work can’t be measured only by numbers. The popularity of music artworks depends more on the artist’s own popularity, creative talent or singing ability, or external factors such as world trends. The probability of success by deliberately catering to audio features and creating specific songs is not sufficient.

Secondly, each top artist has its own artistic characteristics, and will be loved by specific groups of people. Top artists do not create music artworks according to the trend, instead, they will create their own trend for the world.

As for the six kinds of music genres that can stand out from the modern music, there are also their own characteristics inside. It’s hard to understand the reasons for their success because of their unique styles. What we can do is to determine the genre of each song according to its style.

Finally, Although Coldplay, as one of the representative rock artist, their works contain more negative emotions. This is also in line with the rebellious and critical spirit of rock music, and this spirit has been respected by young people of different races all the time. They stick to their own style, try unconventional music routines as far as possible, and point to people’s hearts with straightforward, profound and moving melody. This also confirms our analysis that Coldplay songs’ lyrics convey negative emotions, which does not affect their popularity, but makes them top artists. In conclusion, track popularity will pay more attention to the singer’s own ability and attitude, rather than audio features. The biggest role of audio features is to reflect the singer’s music style, rather than increase popularity.

The R packages we used in this report: Wickham (2016), Waring et al. (2020), Wei and Simko (2017), Arnold (2019), Xie, Cheng, and Tan (2020), Wickham, Hester, and Chang (2020), Wickham, Hester, and Francois (2018), Wickham et al. (2019), Wickham et al. (2020), Grolemund and Wickham (2011), Xie (2020), Zhu (2019), Robinson, Hayes, and Couch (2020), Auguie (2017), Parry and Barr (2020), Silge and Robinson (2016), Hvitfeldt (2020), Thompson (2017), Wilke (2020) .

Reference

Arnold, Jeffrey B. 2019. Ggthemes: Extra Themes, Scales and Geoms for ’Ggplot2’. https://CRAN.R-project.org/package=ggthemes.

Auguie, Baptiste. 2017. GridExtra: Miscellaneous Functions for "Grid" Graphics. https://CRAN.R-project.org/package=gridExtra.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.

Hvitfeldt, Emil. 2020. Textdata: Download and Load Various Text Datasets. https://CRAN.R-project.org/package=textdata.

Parry, Josiah, and Nathan Barr. 2020. Genius: Easily Access Song Lyrics from Genius.com. https://CRAN.R-project.org/package=genius.

Robinson, David, Alex Hayes, and Simon Couch. 2020. Broom: Convert Statistical Objects into Tidy Tibbles. https://CRAN.R-project.org/package=broom.

Silge, Julia, and David Robinson. 2016. “Tidytext: Text Mining and Analysis Using Tidy Data Principles in R.” JOSS 1 (3). https://doi.org/10.21105/joss.00037.

Thompson, Charlie. 2017. “Spotifyr: R Wrapper for the’Spotify’Web Api.” https://github.com/charlie86/spotifyr.

Waring, Elin, Michael Quinn, Amelia McNamara, Eduardo Arino de la Rubia, Hao Zhu, and Shannon Ellis. 2020. Skimr: Compact and Flexible Summaries of Data. https://CRAN.R-project.org/package=skimr.

Wei, Taiyun, and Viliam Simko. 2017. R Package "Corrplot": Visualization of a Correlation Matrix. https://github.com/taiyun/corrplot.

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2020. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

Wickham, Hadley, Jim Hester, and Winston Chang. 2020. Devtools: Tools to Make Developing R Packages Easier. https://CRAN.R-project.org/package=devtools.

Wickham, Hadley, Jim Hester, and Romain Francois. 2018. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.

Wilke, Claus O. 2020. Ggridges: Ridgeline Plots in ’Ggplot2’. https://CRAN.R-project.org/package=ggridges.

Xie, Yihui. 2020. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.org/knitr/.

Xie, Yihui, Joe Cheng, and Xianying Tan. 2020. DT: A Wrapper of the Javascript Library ’Datatables’. https://CRAN.R-project.org/package=DT.

Zhu, Hao. 2019. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.